1 Motivation

The existence of forests is essential for our life on Earth. By covering around 31 percent of the world’s total land area, forests provide a retreat and home to over 80 percent of land animals and countless partially even undiscovered plants. One can say that forests are the backbone of entire ecosystems. A significant part of the oxygen we breathe is provided by the trees, while they also absorb about 25 percent of greenhouse gases. Also economically we are dependent on forests as the livelihoods of about 1.6 billion people around the world are directly or indirectly connected to forests. Furthermore, forests provide 40 percent of today’s global renewable energy supply, as much as solar, hydroelectric and wind power combined. Despite these utilities, forestation across the world has faced several challenges ranging from wildfire, human-driven deforestation, poor management and poor conversation in general. However, a loss of whole forests would mean severe consequences to humanity and life on Earth.

With this project we seek to answer important questions that address these challenges. We want to figure out the causes of destruction of forests, highlight their importance to our environment and predict trends around reforestation/deforestation. Moreover, we hope to show how we can tackle climate change by reforestation, in particular, how an increase in forest area will help to increase the buffer of sustainability. For the statistics so far, see our reference (Opened on 07th of May, 2021).

2 Overview

To begin with we want to give a general overview of global forest development over the last 30 years. Afterwards we want to dig in deeper into the topics of deforestation and reforestation by extracting the main driver countries, showing trends and investigating in possible correlations which is leading to a crucial prediction in how many years forest would be lost, if humankind continues to act as it has in the past. In the next chapter, our analysis leads us into the area of forest destruction by natural causes. After a comprehensive overview and focusing on countries most affected by forest destruction, we want to examine whether there is a correlation between rising temperatures and wildfires and additionally try to predict where and when wildfires are likely to occur. In our final chapter we put our forest data in relation to other environmental issues as air pollution, greenhouse gas emissions and the carbon storage of forests resulting in a prediction of how much forest area has to be further increased to tackle all greenhouse gas emissions.

4 Datasets & Preprocessing

Following Datasets will help us to answer our questions:

4.1 FAO. 2020. Global Forest Resources Assessment 2020

The data of the Global Forest Resources assessment (FAO) is containing 3 datasets. Data on forest development for the intervals between 1990 - 2020, which we mainly use to answer questions around reforestation and deforestation, as well as data on forest disturbance for the period 2000-2017 and data on total forest areas. 1.1 Data on forest development:
1.2 Data on forest disturbances:

First we remove everything that we won’t need and make column names (of columns that we will later need) more readable:

  • The ‘regions’ column gets exchanged by a continent column which provides more intuitive information.
  • The ‘iso3’ column doesn’t provide any further information.
  • The name and continent column are changed to factor variables. We also replace missing values with 0, because there is now safe way to impute these values. However this implies that our results for this part of the project are probably an underestimation.
    1.3 Data on further topics (e.g. Forest Management / Ownership) for the period 1990-2020 (not every year included).

4.2 FAO. 2020. FAOSTAT Emissions-Land Use, Forest Land dataset.

The next dataset is also from the FAO and contains data on emissions and area change due to gain and losses of carbon stocks in living tree biomass for the period 1990-2020.

4.3 FAO. 2021. FAOSTAT Temperature Change Dataset

To work on wildfire predictions due to temperature rise we need data on mean surface temperature change as well. In this case for the period 1961-2020.
4.3.0.0.1 The second data set we use is about mean surface temperature changes for the period 1961-2020:

FAO. 2021. FAOSTAT Temperature Change Dataset

## Rows: 537,370
## Columns: 10
## $ `Area Code`    <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ Area           <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist…
## $ `Months Code`  <dbl> 7001, 7001, 7001, 7001, 7001, 7001, 7001, 7001, 7001, 7…
## $ Months         <chr> "January", "January", "January", "January", "January", …
## $ `Element Code` <dbl> 7271, 7271, 7271, 7271, 7271, 7271, 7271, 7271, 7271, 7…
## $ Element        <chr> "Temperature change", "Temperature change", "Temperatur…
## $ `Year Code`    <dbl> 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1…
## $ Year           <dbl> 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1…
## $ Value          <dbl> 0.746, 0.009, 2.695, -5.277, 1.827, 3.629, -1.436, 0.38…
## $ Flag           <chr> "Fc", "Fc", "Fc", "Fc", "Fc", "Fc", "Fc", "Fc", "Fc", "…

Again we remove everything that we won’t need:

  • All ‘Code’-columns: Add no information.
  • All years that are not in 2000-2017: No forest data for these years.
  • The Unit column: it’s always °C.
  • All missing values (Flags != “Fc”) and afterwards the flag column.

4.4 OECD. 2021. Air quality and health: Exposure to PM2.5 fine particles

Our fourth dataset was created by the OECD and holds data on mean population exposures to outdoor and ambient PM2.5 particles for the period 1990-2019 (not every year included).

4.5 CAIT data: Climate Watch. 2020. GHG Emissions

The CAIT dataset contains data on greenhouse gas (GHG) emissions for the period from 1990-2018.

4.6 FAO. 2021. AQUASTAT Database

Our next data set is the AQUASTAT database from the FAO which shows data of annual averages on precipitation and renewable water resources between 1961-1990.

4.7 Forest land of continents

The last data set is about the forest cover of the continents.

Here we have pre-processed out data set

  • We have imputed the missing values by dropping them

4.8 Missing , Shweta’s part

5 Global Forest Development

This part of our project answers the following questions:

  • What was the global forest development over the last 30 years?
    • What are the trends? (globally)
  • Is there a correlation between air pollution and the amount of forest in a country / globally?
    • Which countries have the most air pollution?

5.2 Countries with the largest forest area

Findings * Russian federation has the largest forest cover in 2020 * The top 10 countries have in sum more forest area than the rest of the world together

6 Deforestation

In our analysis we are now attempting to answer important questions regarding deforestation, which is actually a part of forest destruction. However we want to highlight this issue in an own chapter as it is directly made by humans. Therefore we will have a look on the main drivers (by countries) of deforestation, showing trends over the continents and making a prediction of how many years it would take until all forests are lost by putting the deforestation and reforestation values over the last 30 years into relation.

6.1 Main drivers of deforestation and forest prediction

By hovering over the following map you can find the data for every country regarding the percentage of forest lost through deforestation in the last 30 years in relation to 1990, the value for deforestation in the last 30 years, the current forest area [1000 ha] and the number of years in which the forest area will be completely lost, given the deforestation for each country in the last 30 years.

The color scale is showing the percentage of forest lost, which is making the main drivers of deforestation visible.

Note: for some countries the deforestation data is 0 or not available. If possible, for those countries the deforestation value was calculated by the difference of the forest area between 1990 and 2020. Otherwise they are colored white.

7 Reforestation

In this part we want to dig in deeper into the topic of the reforestation by analyzing which countries are the main drivers, whether there is a correlation between reforestation and deforestation and showing the trends over continents.

7.1 Main drivers of reforestation

As these results show mainly countries with a huge surface, we want to put the increase of reforestation from 1990-2020 in relation to the forest area in 1990.

This leads us to quite surprising results, e.g. that Algeria is the country with the highest reforestation increase given the total forest area in 1990. Nevertheless this doesn’t come out of nowhere, along with other North African countries, Algeria is pursuing several reforestation projects such as the great green wall or “barrage vert” [2]. This results in the fact, that Algeria is one of the countries which has a higher forest cover in 2020 than in 1990.

By hovering over the following map, a tooltip of the reforestation increase for each country is shown. The greener the country, the higher is the increase.

7.2 Relation between reforestation and deforestation

After we have now an overview on reforestation and deforestation we want to answer the question whether governments try to “make up” for the deforestation in the last 30 years.

First of all we want to show the relation of total reforestation and deforestation in the last 30 years, to get a first impression.

There are many outliers with either very high deforestation or reforestation figures. However, such outliers are not surprising when taking a look into recent news.For example Brazil, a country with one of the biggest rainforest areas, has been making negative headlines for years with it’s environment politics [1].

We now zoom in to have a closer look on the data without the outliers by downsizing the scale.

The figure shows already a high and not linear distrubtion of our data.

With the Shapiro-Wilk test we want to show the normality of our data.

## 
##  Shapiro-Wilk normality test
## 
## data:  corref$totalref
## W = 0.25387, p-value < 0.00000000000000022
## 
##  Shapiro-Wilk normality test
## 
## data:  corref$totaldef
## W = 0.16293, p-value < 0.00000000000000022

The values are below 0.05 for both, reforestation and deforestation, the data significantly deviate from a normal distribution. A result which was already highlighted by the graph.

As the data is therefore not linear, we should choose the Spearman method to calculate the correlation.

## 
##  Spearman's rank correlation rho
## 
## data:  corref$totalref and corref$totaldef
## S = 1201838, p-value = 0.0000000000003009
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.4513834

With a value of 0.451 it shows a moderate correlation, which means, that deforestation has actually an impact on reforestation and a relationship exists.

8 Forest Destruction

The next part of our project is about forest destruction and tries to answer the following questions:

  • What were the main causes of forest destruction?
  • Which countries were affected the most?
  • How much forest was destroyed by each of these causes?
  • Is there a correlation between rising temperatures and wildfires?
  • Prediction of where and when wildfires are likely to occur

For the first two questions we decided to look at a global scale and for the country Germany.
For the third question we decided to look at a global scale and for the continent Europe.

Note: In our interactive shiny website you can pick the country / continent of your interest.

8.1 Main causes of forest destruction

Findings:

  • On a global scale wildfires are clearly the dominant cause of forest destruction over the years.
  • However, this does not apply for every individual country:
    • Germany’s main cause for forest destruction are insects.
    • Note: the peak in this plot was caused by the heat wave in 2003.
  • There is no obvious trend to find over this small time scale.

8.2 Destroyed forest by cause

Findings:

  • On a global scale wildfires make up more than 50% of forest destruction, destroying more than 1 Billion ha of forest.
  • Insects make up almost 25% of forest destruction, destroying roughly 500 Million ha of forest.
  • These two are clearly the main drivers of global forest destruction.
  • Germany:
    • Insects and diseases are the main drivers of German forest destruction, destroying approximately 2.5 Million ha of forest over this 18 years time period.
    • Wildfires are no significant problem in Germany.

8.3 Most affected countries

Findings:

  • As one would expect from our previous findings, the most affected countries mostly struggle with wildfires.
  • The only exceptions to this are: USA, Canada, China, Sudan and Mexico.
  • Brazil is most affected country and has a huge wildfire problem.
    • Assumption: since Brazil is a tropical region and therefore very humid these wildfires are probably caused by humans.
  • Europe:
    • Except for Russia, Europe’s most affected countries have no problem with wildfires.

8.4 Relation between rising temperatures and wildfires

Overview:

Our first visualization doesn’t suggest any linear correlation, but it can be improved to make the interpretation clearer.

Next we compare global yearly temperature changes to global forest area destructed by wildfires.

## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.

There is no linear correlation visible for both land and forest fires and rising temperatures.

Now we compare global yearly temperature changes to the global count of wildfires.

Again there is no linear correlation visible for both land and forest fires and rising temperatures. Maybe we can find a correlation if go more into detail and show not only global values but values for each country and each year:

We need a way to quantify these results. Since the data is clearly not linear we use the Predicitive Power Score [..] as a measure for correlation.

## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Wildfires-Temperature relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Temperature-Wildfires relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Wildfires-Temperature relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Temperature-Wildfires relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.

Findings:

  • There is clearly no correlation recognizable in our data between temperature increase and wildfires.

8.5 Prediction of where and when wildfires are likely to occur.

First we look at the available data and see which columns might help us to make a prediction.

We got roughly 3800 samples for the prediction

## x Fold2: preprocessor 1/1, model 1/1 (predictions): Error in model.frame.default(...
## Warning: Cannot retrieve the data used to build the model (so cannot determine roundint and is.binary for the variables).
## To silence this warning:
##     Call rpart.plot with roundint=FALSE,
##     or rebuild the rpart model with model=TRUE.
## Warning: Cannot retrieve the data used to build the model (so cannot determine roundint and is.binary for the variables).
## To silence this warning:
##     Call rpart.plot with roundint=FALSE,
##     or rebuild the rpart model with model=TRUE.

Findings:

  • Our most accurate prediction depends only on the given country and is therefore not very useful in finding the underlying source of wildfires.
  • If we remove the country as a predictor, the main factor of deciding whether a wildfire occurs our not is the temperature, which is kind of a contradiction to our previous question.

9 Relation to other environmental issues

9.1 Relation between greenhouse gas emissions and carbon stored in forests

  1. Emission Dataset
  2. Carbon Stock Dataset
  3. Forest Area Dataset

Preprocessing The first dataset containing data about the volume of carbon emitted per country per year (1990-2020) is processed here;

  1. Some columns were processed to make them consistent with others and for easy identification
  2. Missing values and and irrelevant columns were dropped
  3. The year column was made longer (pivot_longer).

9.2 Analysis and Visualizations of GHG Emissions

  • Here we visualized the top 20 emitters of carbon globally, within defined period. The result is below.

  • As seen below, we sought to know the proportionate constituents of the green house gases; hence the plot below;

  • The average emission per country from 1990 - 2020 was also visualized below
## 190 codes from your data successfully matched countries in the map
## 3 codes from your data failed to match with a country code in the map
## 53 codes from the map weren't represented in your data

9.3 Analysis and Visualization of carbon stock

This dataset captures the carbon stock per country for selected 9 years (1990,2000,2010,2015,2016,2017,2018,2019,2020)

Preprocessing

  1. Some columns were converted to a consistent data type
  2. Missing values were handled with the mice library
  3. Years not captured in the dataset were replaced with the mean carbon stock per country
  4. The unit of the carbon stock was tonnes/ha, where ha is the forest size in hectres
  5. Hence carbon stock was multiplied with forest area.
  6. Finally, the emission dataset was merged with the carbon stock to perform correlation.
  • The trend of carbon stock globally from 1990 to 2018 was shown below

  • Comparatively, the trend of the emitted GHG from 1990 to 2018 was also shown

  • Comparatively, the trend of the emitted GHG from 1990 to 2018 was also shown

  • Here we visualized the countries with the largest carbon stock

9.3.0.1 Correlation proper

We seek to answer the correlation question here, we start by comparing both variables per year.

  • Correlation plot

9.3.0.2 Correlation result

## [1] "Kendall =  -0.954415954415954"
## [1] "Spearman =  -0.992673992673993"
## [1] "Pearson =  -0.871214159804568"

Interpretation of the correlation coefficient; both variables are negatively correlated using all correlation methods. Most likely, if the emission increases, the carbon stock decreases. However, this relationship can not be ascertained using correlation because correlation does not necessarily imply causation.

Also, the absolute value of approximately 1.0 depicts perfectly linear correlation between both variables.

9.4 How much gas will be absorbed if forest area increases

9.4.0.1 Introduction

Here, the objective of our analysis is to investigate how much GHG will be absorbed if forest area is increased. To do this, we use the carbon stock and forest area datasets already loaded.

9.4.0.2 Analysis and Visualization

  • Comparison between absorbed carbon and forest area
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Domain = col_character(),
##   Area = col_character(),
##   Element = col_character(),
##   Item = col_character(),
##   Year = col_double(),
##   Unit = col_character(),
##   Value = col_double()
## )
## 
##  iter imp variable
##   1   1  carbonStock
##   1   2  carbonStock
##   1   3  carbonStock
##   1   4  carbonStock
##   1   5  carbonStock
##   2   1  carbonStock
##   2   2  carbonStock
##   2   3  carbonStock
##   2   4  carbonStock
##   2   5  carbonStock
##   3   1  carbonStock
##   3   2  carbonStock
##   3   3  carbonStock
##   3   4  carbonStock
##   3   5  carbonStock
##   4   1  carbonStock
##   4   2  carbonStock
##   4   3  carbonStock
##   4   4  carbonStock
##   4   5  carbonStock
##   5   1  carbonStock
##   5   2  carbonStock
##   5   3  carbonStock
##   5   4  carbonStock
##   5   5  carbonStock

The result below was gotten after using linear regression

9.4.0.3 Result

  • Carbon_absorbed = 392 + 0.00243 * totalArea

  • For a unit increase in total forest area, the volume of carbon absorbed should increase by 0.00243

  • If there are no forests ie totalArea = 0, the volume of carbon absorbed will be 392.

  • Therefore, a unit increase in forest area, results in a decrease in the quantity of emitted CO2 by 0.00243

9.5 Relation between forest area and air pollution

9.5.1 TODO: which data set is used here?

The air pollution data has air quality value in microgram cubic metre from the year 1990 to 2019. However, a few years data is not available. Air quality shows changes in the amount of pollution in the air.

9.5.2 Findings:

  • No linear relationship between Forest Area and Air Pollution

9.6 Correlation Value

## [1] "Kendall =  -0.032967032967033"

Findings: A negative correlation coefficient shows the variables are moving in the opposite direction.

9.7 Regression Analysis

9.8 fit model and estimate parameters

## parsnip model object
## 
## Fit time:  3ms 
## 
## Call:
## stats::lm(formula = ForestArea ~ AirPollution, data = data)
## 
## Coefficients:
##  (Intercept)  AirPollution  
##        15732          -116

Findings:

9.9 extrapolation

Findings: Very low p-value signifies that Forest Area and Air Pollution are independent of each other.

9.10 Residual Plot

## `geom_smooth()` using formula 'y ~ x'

Finding:

  • As our data set is non-linearly associated, the residual plot here assesses the appropriateness of our linear regression

  • In the plot the residual of data points are away from zero show the model is not a good fit

9.10.1 For more reliable results of the relationship between Forest Area and Air Pollution,

we implement Predictive Power Score

## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 2.8 observations in each test-set for the ForestArea-AirPollution relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 2.8 observations in each test-set for the AirPollution-ForestArea relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.

Findings:

Predictive Power Score of 0 shows no relationship between Forest Area and Air Pollution

9.11 Top 10 countries with most air pollution

## Selecting by avgAirPollution

Findings:

India has the highest air pollution in the last 30 years

9.12 Least air pollution

## Selecting by avgAirPollution

Finding: Nauru has the least air pollution in the last 30 years.

9.12.1 Air pollution trend in India

Findings: * Air Pollution drastically increased after the year 2005. * It was highest in 2012. * It decreased in 2005 by ~10 Micrograms per cubic metre. * Sudden decrease around 2017. * Increased slightly in 2019 again

9.12.2 Air pollution trend in Nauru

Findings: * Air Pollution varies in range of 5 Micrograms per cubic metre and ~7 Micrograms per cubic metre. * Least in 2014

9.13 Required forests for Current Co2 emmission.

The map below highlights the countries which have to increase the forest area the most to tackle their current CO2 emissions. As our data set only provides C02 emission data from 1990-2018, we used the Arima model time series prediction to predict future CO2 emissions. By taking into account that one acre of forest can absorb about 2.5 tons of carbon annually, the predicted CO2 value is used to find the required amount of additional forest area.

10 Final Analysis